Search CORE

308 research outputs found

Context-Aware Zero-Shot Recognition

Author: Han Bohyung
Luo Ruotian
Yang Linjie
Zhang Ning
Publication venue
Publication date: 24/04/2019
Field of study

We present a novel problem setting in zero-shot learning, zero-shot object recognition and detection in the context. Contrary to the traditional zero-shot learning methods, which simply infers unseen categories by transferring knowledge from the objects belonging to semantically similar seen categories, we aim to understand the identity of the novel objects in an image surrounded by the known objects using the inter-object relation prior. Specifically, we leverage the visual context and the geometric relationships between all pairs of objects in a single image, and capture the information useful to infer unseen categories. We integrate our context-aware zero-shot learning framework into the traditional zero-shot learning techniques seamlessly using a Conditional Random Field (CRF). The proposed algorithm is evaluated on both zero-shot region classification and zero-shot detection tasks. The results on Visual Genome (VG) dataset show that our model significantly boosts performance with the additional visual context compared to traditional methods

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Dynamic Proposals for Efficient Object Detection

Author: Cui Yiming
Liu Ding
Yang Linjie
Publication venue
Publication date: 11/07/2022
Field of study

Object detection is a basic computer vision task to loccalize and categorize objects in a given image. Most state-of-the-art detection methods utilize a fixed number of proposals as an intermediate representation of object candidates, which is unable to adapt to different computational constraints during inference. In this paper, we propose a simple yet effective method which is adaptive to different computational resources by generating dynamic proposals for object detection. We first design a module to make a single query-based model to be able to inference with different numbers of proposals. Further, we extend it to a dynamic model to choose the number of proposals according to the input image, greatly reducing computational costs. Our method achieves significant speed-up across a wide range of detection models including two-stage and query-based models while obtaining similar or even better accuracy

arXiv.org e-Print Archive

DQ-Det: Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation

Author: Cui Yiming
Yang Linjie
Yu Haichao
Publication venue
Publication date: 23/07/2023
Field of study

Transformer-based detection and segmentation methods use a list of learned detection queries to retrieve information from the transformer network and learn to predict the location and category of one specific object from each query. We empirically find that random convex combinations of the learned queries are still good for the corresponding models. We then propose to learn a convex combination with dynamic coefficients based on the high-level semantics of the image. The generated dynamic queries, named modulated queries, better capture the prior of object locations and categories in the different images. Equipped with our modulated queries, a wide range of DETR-based models achieve consistent and superior performance across multiple tasks including object detection, instance segmentation, panoptic segmentation, and video instance segmentation.Comment: 12 pages, 4 figures, ICML 202

arXiv.org e-Print Archive

Relabeling Minimal Training Subset to Flip a Prediction

Author: Xu Linjie
Yang Jinghan
Yu Lequan
Publication venue
Publication date: 28/06/2023
Field of study

When facing an unsatisfactory prediction from a machine learning model, it is crucial to investigate the underlying reasons and explore the potential for reversing the outcome. We ask: can we result in the flipping of a test prediction

x_t

by relabeling the smallest subset

\mathcal{S}_t

of the training data before the model is trained? We propose an efficient procedure to identify and relabel such a subset via an extended influence function. We find that relabeling fewer than 1% of the training points can often flip the model's prediction. This mechanism can serve multiple purposes: (1) providing an approach to challenge a model prediction by recovering influential training subsets; (2) evaluating model robustness with the cardinality of the subset (i.e.,

|\mathcal{S}_t|

); we show that

|\mathcal{S}_t|

is highly related to the noise ratio in the training set and

|\mathcal{S}_t|

is correlated with but complementary to predicted probabilities; (3) revealing training points lead to group attribution bias. To the best of our knowledge, we are the first to investigate identifying and relabeling the minimal training subset required to flip a given prediction.Comment: Under revie

arXiv.org e-Print Archive